Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Extraction

Authors: Béatrice Daille, Helena Blancafort

Research in Computing Science, Vol. 70, pp. 173-186, 2013.

Abstract: In this paper, we present two terminology extraction tools in order to compare a knowledge-poor and a knowledge-rich approach. Both tools process single and multi-word terms and are designed to handle multilingualism. We run an evaluation on six languages and two different domains using crawled comparable corpora and hand-crafted reference term lists. We discuss the three main results achieved for terminology extraction. The first two evaluation scenarios concern the knowledge-rich framework. Firstly, we compare performances for each of the languages depending on the ranking that is applied: specificity score vs. the number of occurrences. Secondly, we examine the relevancy of the term variant identification to increase the precision ranking for any of the languages. The third evaluation scenario compares both tools and demonstrates that a probabilistic term extraction approach, developed with minimal effort, achieves satisfactory results when compared to a rule-based method.

PDF: Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Extraction
PDF: Knowledge-poor and Knowledge-rich Approaches for Multilingual Terminology Extraction